import SetupEnv from './common/setup-env.mdx';

# MCP server

Midscene provides a MCP server that allows AI assistants to control browsers, automate web tasks and write automation scripts for Midscene.

:::info MCP Introduction
MCP ([Model Context Protocol](https://modelcontextprotocol.io/introduction)) is a standardized way for AI models to interact with external tools and capabilities. MCP servers expose a set of tools that AI models can invoke to perform various tasks. In Midscene's case, these tools allow AI models to control browsers, navigate web pages, interact with UI elements, and more.
:::

## Use cases

* Control browsers to execute automation tasks
* Automatically generate Midscene automation scripts

### Examples

> Generate Midscene test cases for the Sauce Demo site

<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/ozpmyhn_lm_hymuPild/ljhwZthlaukjlkulzlp/midscene/en-midscene-mcp-Sauce-Demo.mp4" controls/>


## Setting up Midscene MCP

### Prerequisites

1. An OpenAI API key or another supported AI model provider. For more information, see [Choosing an AI Model](./choose-a-model).
2. For Chrome browser integration (Bridge Mode):
   - Install the Midscene Chrome extension (download from [Chrome Web Extension](https://chromewebstore.google.com/detail/midscenejs/gbldofcpkknbggpkmbdaefngejllnief?hl=en&utm_source=ext_sidebar))
   - Switch to "Bridge Mode" in the extension and click "Allow Connection"

### Configuration

Add the Midscene MCP server to your MCP configuration:

```json
{
  "mcpServers": {
    "mcp-midscene": {
      "command": "npx",
      "args": ["-y", "@midscene/mcp"],
      "env": {
        "MIDSCENE_MODEL_NAME": "REPLACE_WITH_YOUR_MODEL_NAME",
        "OPENAI_API_KEY": "REPLACE_WITH_YOUR_OPENAI_API_KEY",
        "MCP_SERVER_REQUEST_TIMEOUT": "800000"
      }
    }
  }
}
```

For more information about configuring AI models, see [Choosing an AI Model](./choose-a-model).

## Available tools

Midscene MCP provides the following browser automation tools:

| Category | Tool Name | Description |
|---------|---------|---------|
| **Navigation** | midscene_navigate | Navigate to a specified URL in the current tab |
| **Tab Management** | midscene_get_tabs | Get a list of all open browser tabs |
| | midscene_set_active_tab | Switch to a specific tab by ID |
| **Page Interaction** | midscene_aiTap | Click on an element described in natural language |
| | midscene_aiInput | Input text into a form field or element |
| | midscene_aiHover | Hover over an element |
| | midscene_aiKeyboardPress | Press a specific keyboard key |
| | midscene_aiScroll | Scroll the page or a specific element |
| **Verification and Observation** | midscene_aiWaitFor | Wait for a condition to be true on the page |
| | midscene_aiAssert | Assert that a condition is true on the page |
| | midscene_screenshot | Take a screenshot of the current page |
| **Playwright Code Example** | midscene_playwright_example | Provides Playwright code examples for Midscene |

### Navigation

- **midscene_navigate**: Navigate to a specified URL in the current tab
  ```
  Parameters:
  - url: The URL to navigate to
  ```

### Tab management

- **midscene_get_tabs**: Get a list of all open browser tabs, including their IDs, titles, and URLs
  ```
  Parameters: None
  ```

- **midscene_set_active_tab**: Switch to a specific tab by ID
  ```
  Parameters:
  - tabId: The ID of the tab to activate
  ```

### Page interaction

- **midscene_aiTap**: Click on an element described in natural language
  ```
  Parameters:
  - locate: Natural language description of the element to click
  ```

- **midscene_aiInput**: Input text into a form field or element
  ```
  Parameters:
  - value: The text to input
  - locate: Natural language description of the element to input text into
  ```

- **midscene_aiHover**: Hover over an element
  ```
  Parameters:
  - locate: Natural language description of the element to hover over
  ```

- **midscene_aiKeyboardPress**: Press a specific keyboard key
  ```
  Parameters:
  - key: The key to press (e.g., 'Enter', 'Tab', 'Escape')
  - locate: (Optional) Description of element to focus before pressing the key
  - deepThink: (Optional) If true, uses more precise element location
  ```

- **midscene_aiScroll**: Scroll the page or a specific element
  ```
  Parameters:
  - direction: 'up', 'down', 'left', or 'right'
  - scrollType: 'once', 'untilBottom', 'untilTop', 'untilLeft', or 'untilRight'
  - distance: (Optional) Distance to scroll in pixels
  - locate: (Optional) Description of the element to scroll
  - deepThink: (Optional) If true, uses more precise element location
  ```

### Verification and observation

- **midscene_aiWaitFor**: Wait for a condition to be true on the page
  ```
  Parameters:
  - assertion: Natural language description of the condition to wait for
  - timeoutMs: (Optional) Maximum time to wait in milliseconds
  - checkIntervalMs: (Optional) How often to check the condition
  ```

- **midscene_aiAssert**: Assert that a condition is true on the page
  ```
  Parameters:
  - assertion: Natural language description of the condition to check
  ```

- **midscene_screenshot**: Take a screenshot of the current page
  ```
  Parameters:
  - name: Name for the screenshot
  ```

## Common issues

### What advantages does Midscene MCP have over other browser MCPs?

- Midscene MCP supports Bridge mode by default, allowing you to **directly control your current browser** without needing to **log in again or download a browser**
- Midscene MCP includes built-in optimal prompt templates and operation execution practices for browser page control, providing **more stable and reliable browser automation experiences** compared to other MCP implementations
- Midscene MCP automatically generates execution case reports after completing tasks, allowing you to **view the execution process at any time**

### Local port conflicts when multiple clients are used

> Problem description

When users simultaneously use Midscene MCP in multiple local clients (Claude Desktop, Cursor MCP, etc.), port conflicts may occur causing server errors

> Solution

- Temporarily close the MCP server in the extra clients
- Execute the command:

```bash
# For macOS/Linux:
lsof -i:3766 | awk 'NR>1 {print $2}' | xargs -r kill -9

# For Windows:
FOR /F "tokens=5" %i IN ('netstat -ano ^| findstr :3766') DO taskkill /F /PID %i
```


### How to access Midscene execution reports

After each task execution, a Midscene task report is generated. You can open this HTML report directly from the command line:

```bash
# Replace the opened address with your report filename
open report_file_name.html
```

![image](https://lf3-static.bytednsdoc.com/obj/eden-cn/ozpmyhn_lm_hymuPild/ljhwZthlaukjlkulzlp/midscene/image.png)
